Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Font Group Identification Using Reconstructed Fonts

Identifieur interne : 000563 ( Main/Exploration ); précédent : 000562; suivant : 000564

Font Group Identification Using Reconstructed Fonts

Auteurs : Michael P. Cutter [Allemagne] ; Joost Van Beusekom [Allemagne] ; Faisal Shafait [Allemagne] ; Thomas M. Breuel [Allemagne]

Source :

RBID : Pascal:11-0279184

Descripteurs français

English descriptors

Abstract

Ideally, digital versions of scanned documents should be represented in a format that is searchable, compressed, highly readable, and faithful to the original. These goals can theoretically be achieved through OCR and font recognition, re-typesetting the document text with original fonts. However, OCR and font recognition remain hard problems, and many historical documents use fonts that are not available in digital forms. It is desirable to be able to reconstruct fonts with vector glyphs that approximate the shapes of the letters that form a font. In this work, we address the grouping of tokens in a token-compressed document into candidate fonts. This permits us to incorporate font information into token-compressed images even when the original fonts are unknown or unavailable in digital format. This paper extends previous work in font reconstruction by proposing and evaluating an algorithm to assign a font to every character within a document. This is a necessary step to represent a scanned document image with a reconstructed font. Through our evaluation method, we have measured a 98.4% accuracy for the assignment of letters to candidate fonts in multi-font documents.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Font Group Identification Using Reconstructed Fonts</title>
<author>
<name sortKey="Cutter, Michael P" sort="Cutter, Michael P" uniqKey="Cutter M" first="Michael P." last="Cutter">Michael P. Cutter</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>University of Kaiserslautern</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<settlement type="city">Kaiserslautern</settlement>
<region type="land" nuts="2">Rhénanie-Palatinat</region>
</placeName>
<orgName type="university">Université de technologie de Kaiserslautern</orgName>
</affiliation>
</author>
<author>
<name sortKey="Van Beusekom, Joost" sort="Van Beusekom, Joost" uniqKey="Van Beusekom J" first="Joost" last="Van Beusekom">Joost Van Beusekom</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>University of Kaiserslautern</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<settlement type="city">Kaiserslautern</settlement>
<region type="land" nuts="2">Rhénanie-Palatinat</region>
</placeName>
<orgName type="university">Université de technologie de Kaiserslautern</orgName>
</affiliation>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>German Research Center for Artificial Intelligence (DFKI)</s1>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<wicri:noRegion>German Research Center for Artificial Intelligence (DFKI)</wicri:noRegion>
<wicri:noRegion>German Research Center for Artificial Intelligence (DFKI)</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Shafait, Faisal" sort="Shafait, Faisal" uniqKey="Shafait F" first="Faisal" last="Shafait">Faisal Shafait</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>University of Kaiserslautern</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<settlement type="city">Kaiserslautern</settlement>
<region type="land" nuts="2">Rhénanie-Palatinat</region>
</placeName>
<orgName type="university">Université de technologie de Kaiserslautern</orgName>
</affiliation>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>German Research Center for Artificial Intelligence (DFKI)</s1>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<wicri:noRegion>German Research Center for Artificial Intelligence (DFKI)</wicri:noRegion>
<wicri:noRegion>German Research Center for Artificial Intelligence (DFKI)</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Breuel, Thomas M" sort="Breuel, Thomas M" uniqKey="Breuel T" first="Thomas M." last="Breuel">Thomas M. Breuel</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>University of Kaiserslautern</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<settlement type="city">Kaiserslautern</settlement>
<region type="land" nuts="2">Rhénanie-Palatinat</region>
</placeName>
<orgName type="university">Université de technologie de Kaiserslautern</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">11-0279184</idno>
<date when="2011">2011</date>
<idno type="stanalyst">PASCAL 11-0279184 INIST</idno>
<idno type="RBID">Pascal:11-0279184</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000129</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000644</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000117</idno>
<idno type="wicri:doubleKey">0277-786X:2011:Cutter M:font:group:identification</idno>
<idno type="wicri:Area/Main/Merge">000569</idno>
<idno type="wicri:Area/Main/Curation">000563</idno>
<idno type="wicri:Area/Main/Exploration">000563</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Font Group Identification Using Reconstructed Fonts</title>
<author>
<name sortKey="Cutter, Michael P" sort="Cutter, Michael P" uniqKey="Cutter M" first="Michael P." last="Cutter">Michael P. Cutter</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>University of Kaiserslautern</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<settlement type="city">Kaiserslautern</settlement>
<region type="land" nuts="2">Rhénanie-Palatinat</region>
</placeName>
<orgName type="university">Université de technologie de Kaiserslautern</orgName>
</affiliation>
</author>
<author>
<name sortKey="Van Beusekom, Joost" sort="Van Beusekom, Joost" uniqKey="Van Beusekom J" first="Joost" last="Van Beusekom">Joost Van Beusekom</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>University of Kaiserslautern</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<settlement type="city">Kaiserslautern</settlement>
<region type="land" nuts="2">Rhénanie-Palatinat</region>
</placeName>
<orgName type="university">Université de technologie de Kaiserslautern</orgName>
</affiliation>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>German Research Center for Artificial Intelligence (DFKI)</s1>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<wicri:noRegion>German Research Center for Artificial Intelligence (DFKI)</wicri:noRegion>
<wicri:noRegion>German Research Center for Artificial Intelligence (DFKI)</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Shafait, Faisal" sort="Shafait, Faisal" uniqKey="Shafait F" first="Faisal" last="Shafait">Faisal Shafait</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>University of Kaiserslautern</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<settlement type="city">Kaiserslautern</settlement>
<region type="land" nuts="2">Rhénanie-Palatinat</region>
</placeName>
<orgName type="university">Université de technologie de Kaiserslautern</orgName>
</affiliation>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>German Research Center for Artificial Intelligence (DFKI)</s1>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<wicri:noRegion>German Research Center for Artificial Intelligence (DFKI)</wicri:noRegion>
<wicri:noRegion>German Research Center for Artificial Intelligence (DFKI)</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Breuel, Thomas M" sort="Breuel, Thomas M" uniqKey="Breuel T" first="Thomas M." last="Breuel">Thomas M. Breuel</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>University of Kaiserslautern</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<settlement type="city">Kaiserslautern</settlement>
<region type="land" nuts="2">Rhénanie-Palatinat</region>
</placeName>
<orgName type="university">Université de technologie de Kaiserslautern</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2011">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Accuracy</term>
<term>Algorithms</term>
<term>Document image processing</term>
<term>Image compression</term>
<term>Imagery</term>
<term>Optical character recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Imagerie</term>
<term>Algorithme</term>
<term>Compression image</term>
<term>Reconnaissance optique caractère</term>
<term>Traitement image document</term>
<term>Précision</term>
<term>0130C</term>
<term>4230</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Ideally, digital versions of scanned documents should be represented in a format that is searchable, compressed, highly readable, and faithful to the original. These goals can theoretically be achieved through OCR and font recognition, re-typesetting the document text with original fonts. However, OCR and font recognition remain hard problems, and many historical documents use fonts that are not available in digital forms. It is desirable to be able to reconstruct fonts with vector glyphs that approximate the shapes of the letters that form a font. In this work, we address the grouping of tokens in a token-compressed document into candidate fonts. This permits us to incorporate font information into token-compressed images even when the original fonts are unknown or unavailable in digital format. This paper extends previous work in font reconstruction by proposing and evaluating an algorithm to assign a font to every character within a document. This is a necessary step to represent a scanned document image with a reconstructed font. Through our evaluation method, we have measured a 98.4% accuracy for the assignment of letters to candidate fonts in multi-font documents.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Allemagne</li>
</country>
<region>
<li>Rhénanie-Palatinat</li>
</region>
<settlement>
<li>Kaiserslautern</li>
</settlement>
<orgName>
<li>Université de technologie de Kaiserslautern</li>
</orgName>
</list>
<tree>
<country name="Allemagne">
<region name="Rhénanie-Palatinat">
<name sortKey="Cutter, Michael P" sort="Cutter, Michael P" uniqKey="Cutter M" first="Michael P." last="Cutter">Michael P. Cutter</name>
</region>
<name sortKey="Breuel, Thomas M" sort="Breuel, Thomas M" uniqKey="Breuel T" first="Thomas M." last="Breuel">Thomas M. Breuel</name>
<name sortKey="Shafait, Faisal" sort="Shafait, Faisal" uniqKey="Shafait F" first="Faisal" last="Shafait">Faisal Shafait</name>
<name sortKey="Shafait, Faisal" sort="Shafait, Faisal" uniqKey="Shafait F" first="Faisal" last="Shafait">Faisal Shafait</name>
<name sortKey="Van Beusekom, Joost" sort="Van Beusekom, Joost" uniqKey="Van Beusekom J" first="Joost" last="Van Beusekom">Joost Van Beusekom</name>
<name sortKey="Van Beusekom, Joost" sort="Van Beusekom, Joost" uniqKey="Van Beusekom J" first="Joost" last="Van Beusekom">Joost Van Beusekom</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000563 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000563 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:11-0279184
   |texte=   Font Group Identification Using Reconstructed Fonts
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024